please describe
- North America > United States > California (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Austria > Styria > Graz (0.04)
- (4 more...)
- Law (1.00)
- Information Technology > Security & Privacy (0.67)
a7c4163b33286261b24c72fd3d1707c9-Supplemental-Datasets_and_Benchmarks.pdf
These datasets enable large-scale study of abuse detection for these languages. Anonymized comments: To further address privacy concerns, we anonymize our dataset. We combine thehate and offensivecategories in these datasets for training a binary classification model. We showthepercentage (%)ofemoticons present inourdatasetMACDinTable12. Infuture work,we will investigate in detail about the impact of emoticons on abuse detection. However,duetothe limited scale and diversity of abuse detection datasets in Indic languages, development of these models for Indic languages has been severely impeded.
- Law (0.47)
- Information Technology (0.34)
SupplementaryMaterial: CARLANE: ALaneDetectionBenchmarkfor UnsupervisedDomainAdaptationfromSimulationto multipleReal-WorldDomains
Does the dataset contain all possible instancesorisitasample(notnecessarilyrandom) of instances from a larger set? If the dataset is a sample, then what is the larger set? Is the sample representative of the larger set (e.g., geographic coverage)? If so, please describe how this representativeness was validated/verified.
SupplementaryMaterial-WikiDO: ANewBenchmarkEvaluatingCross-ModalRetrieval forVision-LanguageModels
This has been addressed in7 prior work [4, 3] by finetuning VLMs on a given corpus for a given task [5] and8 conducting zero-shot evaluations on a new corpus [7]. However, the mere use of an9 unseen corpus for evaluation does not imply it is OOD. Q1 What do the instances that comprise the dataset represent (e.g., documents, photos,24 people,countries)? Pleaseprovideadescription.26 (a) We provide 384k image-text pairs. Q3 Does the dataset contain all possible instances or is it a sample (not necessarily ran-36 dom) of instances from a larger set? If the dataset is a sample, then what is the larger37 set?
DA T ASHEET: MOTIVE
Please see the most updated version here . Was there a specific task in mind? Was there a specific gap that needed to be filled? The MOTI VE dataset was created to promote the development of new drug-target interaction (DTI) prediction models based on both, existing relationships between compounds and their protein targets, and the similarity of JUMP Cell Painting morphological features of perturbed cells [2].The MOTI VE dataset was created with the DTI task in mind, and addresses a lack of graph-based biological datasets with empirical node features. Who created this dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)? This dataset was created by the Carpenter-Singh Lab in the Imaging Platform at the Broad Institute of MIT and Harvard, Cambridge, Massachusetts. What support was needed to make this dataset? If there is an associated grant, provide the name of the grantor and the grant name and number, or if it was supported by a company or government agency, give those details.) The authors gratefully acknowledge an internship from the Massachusetts Life Sciences Center (to ES).
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government (1.00)
- Law (0.94)
- Asia > China > Liaoning Province > Shenyang (0.40)
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > New Jersey (0.04)
- (8 more...)
- Law (1.00)
- Government (1.00)
- Information Technology > Security & Privacy (0.93)
- Leisure & Entertainment (0.67)
- Information Technology (0.69)
- Law (0.69)
- Government (0.46)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)